Skip to content

[WIP]Add Func: aclgraph_batch_size auto-adjust to different model #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

chris668899
Copy link
Contributor

@chris668899 chris668899 commented Apr 30, 2025

What this PR does / why we need it?

This PR add new function of : aclgraph_batch_size can dynamic adjust to different model; before this PR, the aclgraph_batch_sizes given from vllm to vllm-ascend always too large, and that may result in ERROR while running on different, with the information: "The resources are insufficient".
Now, with this PR, the code can dynamic adjust aclgraph_batch_sizes depend on the model hidden_layer_nums and parallel config, for example:
a. for Qwen2.5-7B, the aclgraph_batch_size length is 33 total;
b. for Qwen2.5-72B, the aclgraph_batch_size length is 11 total;

parallel_type_cnt = 0
dp_size = self.vllm_config.parallel_config.data_parallel_size
tp_size = self.vllm_config.parallel_config.tensor_parallel_size
if dp_size > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the bigger the parallel size, the smaller the graph step? Should be bigger right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types of parallel strategies influence the length of the list. Therefore, the more types of parallel strategies there are, the shorter the list becomes. However, the maximum supported batch_size value in the list remains unchanged.

from torch_npu.op_plugin.atb._atb_ops import _register_atb_extensions
from vllm import LLM, SamplingParams

_register_atb_extensions()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch_npu needs to preload atb's .so before the dyanmo trace procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"_register_atb_extensions()" has been removed

"Qwen/Qwen2.5-0.5B-Instruct",
]

TENSOR_PARALLELS = [2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a multicard ut, let's move this to path tests/multicard to make sure it is tested as expected

Copy link
Contributor Author

@chris668899 chris668899 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been moved to multicard

@ganyi1996ppo
Copy link
Collaborator

ganyi1996ppo commented May 6, 2025

Please don't merge this PR, we may still need to discuss it with torch_npu and CANN team on this. This solution is neither follow the cuda behavior nor good for performance.

@chris668899 chris668899 changed the title [WIP]Add Func: npugraph_batch_size auto-adjust to different model [WIP]Add Func: aclgraph_batch_size auto-adjust to different model May 7, 2025
@ganyi1996ppo
Copy link
Collaborator

ganyi1996ppo commented May 7, 2025

Please replace all the npugraph to aclgraph.

@ganyi1996ppo
Copy link
Collaborator

ganyi1996ppo commented May 7, 2025

For now, seems we don't have much choice on this, for the large model with lots of layers and comm group, we may only have small number of aclgraph cached in memory. Which means enormous padding may happened in many scenario and thus cause the performance regression. cc @wangxiyuan @Yikun

@github-actions github-actions bot added documentation Improvements or additions to documentation module:ops labels May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module:ops module:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants